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Abstract 

The ribosomal DNA (rDNA) of eukaryotes is organized as large tandem arrays. Here, we compare the genomic locations of rDNA 
among yeast species and show that, despite its huge size (> 1 Mb), the rDNA array has moved around the genome several times within 
the family Saccharomycetaceae. We identify an ancestral, nontelomeric, rDNA site that is conserved across many species including 
Saccharomyces cerevisiae. Within the genus Lachancea, however, the rDNA apparently transposed from the ancestral site to a new 
site internal to a different chromosome, becoming inserted into a short intergenic region beside a tRNA gene. In at least four other 
yeast lineages, the rDNA moved from the ancestral site to telomeric locations. Remarkably, both the ancestral rDNA site and the new 
site in Lachancea are adjacent to protein-coding genes whose products maintain the specialized chromatin structure of rDNA (HMO 1 
and CDC14, respectively). In almost every case where the rDNA was lost from the ancestral site, the entire array disappeared without 
any other rearrangements in the region, leaving just an intergenic spacer of less than 2 kb. The mechanism by which this large and 
complex locus moves around the genome is unknown, but we speculate that it may involve the formation of double-strand DNA 
breaks by Fob1 protein or the formation of extrachromosomal rDNA circles. 
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Introduction 

The structural RNA components of the ribosome are the most 
abundant RNA molecules in most organisms, and there is a 
direct correlation between ribosomal RNA (rRNA) abundance 
and growth rate in many microbes (Gourse et al. 1 996; Rudra 
and Warner 2004). High concentrations of the rRNA mole- 
cules are achieved not only by high levels of transcription but 
also by the presence of multiple copies of each gene. In yeasts 
related to Saccharomyces cerevisiae, the genes for the four 
structural RNAs — 5S, 18S, 5.8S, and 25S— are located beside 
one another in a unit that is repeated tens or hundreds of 
times in tandem to form one or more large arrays. In 
5. cerevisiae, the array is estimated to be approximately 
1.4 Mb long. It contains approximately 150 copies of a 
9,081 -bp repeating unit, located at a single site on chromo- 
some XII and accounting for approximately 1 0% of the size of 
the genome (Schweizer et al. 1969; Kobayashi et al. 1998). 
The sequences of the ribosomal DNA (rDNA) units within the 
array are homogenized by highly efficient concerted evolu- 
tion, resulting in very little sequence variation among the dif- 
ferent copies (Ganley and Kobayashi 2007, 2011). The 
organization of rDNA in most other eukaryotes is similar to 
that in fungi except that most have separate arrays of the 5S 



gene (which is transcribed by RNA polymerase III) and the 35S 
gene (which is transcribed by RNA polymerase I as a 35S pre- 
cursor that is cleaved to make the mature 18S, 5.8S, and 25S 
rRNAs). In some eukaryotes, the 5S gene is coamplified in 
an array with other repeated genes such as histones 
(Bergeron and Drouin 2008). During fungal evolution, there 
have been several incidences of inversion of the 5S gene's 
orientation relative to the 355 gene within the array, and of 
gain or loss of 5S gene copies from the array (Bergeron and 
Drouin 2008). 

Although the concerted evolution of eukaryotic rDNA is 
well known, less attention has been paid to the location of 
the rDNA array(s) within genomes and to whether (and how) 
this location can change during evolution. One probable 
reason for the lack of study is that in most eukaryotes, the 
rDNA is located in either subtelomeric or pericentromeric het- 
erochromatin (Long and Dawid 1980; Eickbush TH and 
Eickbush DG 2007). Synteny is generally not conserved in 
these regions, so no inferences can be drawn about the evo- 
lution of rDNA location, although cytogenetic studies have 
found that the locations and number of rDNA arrays can be 
quite variable within some animal and plant genera (Shishido 
et al. 2000; Datson and Murray 2006; Cazaux et al. 2011). 
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Even in eukaryotes with small genomes, the rDNA is usually 
located near telomeres (Torres-Machorro et al. 2010), and its 
location may play a role in the protection of chromosome ends 
in some genomes (Nosek et al. 2006; Silver et al. 2010). In 
5. cerevisiae, however, the rDNA is located at an internal site 
on a chromosome, approximately 450 kb from the left telo- 
mere and 610 kb from the right telomere of chromosome XII. 
Altering this location in laboratory experiments (by splitting 
chromosome XII on one or both sides of the rDNA array, to 
form two or three new chromosomes) was found to have 
significant negative effects on replicative lifespan (Kim et al. 
2006), which suggests that natural selection may act to 
optimize rDNA location. Here, we use synteny conservation 
among yeast species to provide a high-resolution view of 
how rDNA locations can change. We show that the 
chromosome XII rDNA site is ancestral to many yeast species 
but that the rDNA has moved away from this site in several 
lineages. 



Materials and Methods 

For the seven species sequenced in our laboratory by 
Roche-454 pyrosequencing (Gordon et al. 201 1), we assem- 
bled a consensus sequence for the rDNA unit from numerous 
small contigs and searched for overlaps between this consen- 
sus and the ends of genomic sequence scaffolds. We also used 
paired sequencing reads (3 kb, 8 kb, and 20 kb libraries) to 
establish linkages between the rDNA and the rest of the 
genome. The 18S, 5.8S, 25S, and 5S gene structures were 
inferred by BLASTN searches with the S. cerevisiae genes as 
queries. In Tetrapisispora blattae, the rDNA consensus over- 
lapped with the telomeric end of a long (1 7 kb) sequence that 
is almost identical between the ends of chromosomes 4 and 5, 
and we assumed that rDNA arrays are located on both chro- 
mosomes. The Vanderwaltozyma polyspora genome assembly 
is incomplete and consists of 41 scaffolds (Scannell et al. 
2007). There are rDNA arrays at the ends of two scaffolds, 
of which one is colinear with the ancestral site on one side 
(fig. 1) and the other appears to be subtelomeric. For L. waltii, 
the rDNA array was assembled and mapped to chromosome 8 
by Di Rienzi et al. (201 1 , 201 2). We inferred that it lies in a gap 
between scaffolds sO and s34 on this chromosome (Kellis et al. 
2004), based on an overlap with scaffold s34, which makes it 
colinear with the organization in L. thermotolerans (Souciet 
et al. 2009). For Candida giabrata, the genome sequence in- 
cludes an annotated rDNA locus at the telomere of chromo- 
some 12R and a second unannotated incomplete locus at 
telomere 13R (Dujon et al. 2004; Muller et al. 2009). rDNA 
locations in the other species were inferred and annotated by 
the original authors (Johnston et al. 1997; Dietrich et al. 
2004; Dujon et al. 2004; Souciet et al. 2009; Wendland and 
Walther 2011). 



Results 

An Ancestral rDNA Location in Saccharomycetaceae 

We compared rDNA locations in 17 yeast species of family 
Saccharomycetaceae, including nine whose ancestor under- 
went whole-genome duplication (WGD) and eight that di- 
verged from the S. cerevisiae lineage before the WGD 
occurred. We used the Yeast Gene Order Browser (Byrne 
and Wolfe 2005, 2006) and the inferred ("Ancestral") gene 
order that existed in the common ancestor of all post-WGD 
species (Gordon et al. 2009) to study synteny relationships in 
the neighborhood of the rDNA in each species. In well-studied 
genomes such as S. cerevisiae and Eremothecium gossypii, the 
complete rDNA units in the array are known to be flanked by 
incomplete or rearranged units (Johnston et al. 1997; Dietrich 
et al. 2004). In other species whose genomes have been se- 
quenced by shotgun or next-generation technologies, the 
exact structure of the junctions between the rDNA array 
and the neighboring nonrepetitive DNA have often not 
been determined, but the location and orientation of the 
rDNA has been inferred from paired sequencing data where 
one sequence read is in rDNA and the other is unique. There is 
only one rDNA locus in the genome sequence of 14 species 
and two loci in the other three species (7". blattae, V. polyspora, 
and C. giabrata). 

We found that 1 0 of the 1 7 studied species — six post-WGD 
and four non-WGD — share a syntenic location for their rDNA 
arrays, which can therefore be inferred to be an ancestral 
rDNA location predating the WGD (fig. 1). Compared with 
the Ancestral yeast genome (Gordon et al. 2009), the location 
of the ancestral rDNA array is between genes Anc_8.371 
(ARG82) and Anc_8.372 (HM01). This location is internal to 
ancestral chromosome Anc_8, which contained 879 genes so 
the rDNA is far from both telomeres. This ancestral rDNA 
location is maintained in the non-WGD species E. gossypii, 
E. cymbalariae, and Lachancea kluyveri and also in 
Kluyveromyces lactis once an inversion of the neighboring 
region on one side is taken into account. The WGD event 
duplicated ancestral chromosome 8, forming two daughter 
chromosomes that we refer to as Anc_8A and Anc_8B 
(fig. 1). We can infer that after WGD, the rDNA array was 
retained on Anc_8A but lost from Anc_8B, becoming single 
copy like many of the protein-coding genes in the region. Of 
the nine post-WGD species, six retain rDNA at the ancestral 
site on chromosomes descended from Anc_8A, whereas in 
the other three species, the Anc_8A region has become rear- 
ranged and the rDNA is now at a telomere. 

We can infer that rDNA arrays have been completely de- 
leted from the ancestral rDNA site on three occasions, marked 
by X symbols in figure 1 . In each of these events, the rDNA 
was deleted without causing a break of synteny in the region. 
One event is the loss of rDNA from Anc_8B after WGD, which 
must have happened quickly after WGD because it is shared 
by all nine post-WGD species. Deletion of the rDNA from 
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Fig. 1. — Comparison of rDNA locations in yeast species. The black arrows represent the rDNA array, where present, with the direction showing the 
orientation of the 35S genes. Dots represent protein-coding genes, identified by their numbering in the Ancestral genome (e.g., Anc_8.364), which can be 
viewed using the YGOB browser (Byrne and Wolfe 2005). X indicates an inferred deletion of rDNA from the ancestral location between Anc_8.371 and 
Anc_8.372. "Telo" indicates that rDNA is now at a telomeric location. Broken horizontal lines indicate disruptions of synteny. Letters Q and Y indicate 
tRNA-GIn and tRNA-Tyr genes, respectively. Genes that do not have Ancestral numbers (i.e., genes that are not at orthologous locations in post-WGD and 
non-WGD species) are not shown. 



Anc_8B occurred without disturbing the flanking genes 
ARG82 and HM01 (which were not retained in 
duplicate on Anc_8A), and the intergenic distance between 
ARG82 and HM01 is now less than 2 kb in each of these nine 
species. A second deletion from the ancestral site occurred in 
the non-WGD species Torulaspora delbrueckii, and a third 
occurred within the genus Lachancea as described later. 



New rDNA Locations 

In Lachancea, rDNA was deleted from the ancestral site in the 
common ancestor of L. thermotolerans and L. waltii after it 
had diverged from L. kluyveri (fig. 1). This event is interesting 
because the rDNA seems to have simply transposed out of 
one internal chromosomal site and into another. The new 
location in L. thermotolerans and L. waltii is on Ancestral 
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Table 1 

Location of rDNA in Species with Telomeric Arrays 



Species Chromosome 3 Nearest Telomeric rDNA 

Ancestral Gene in Anc? Orientation 1 " 



Candida glabrata 



12 R Anc_8.2 Yes Cen 

13 R Anc_5-1 Yes Cen 

Tetrapisispora phaffii 9 L Anc_8.877 Yes Cen 

I 4 L Anc_2.83 No Tel 

Tetrapisispora blattae { 5R Anc_4.389 Yes Tel 

Torulaspora delbrueckii 8 R Anc_7.1 Yes Cen 

Zygosaccharomyces rouxii 5 R Anc_3.581 Yes Cen 

a L and R refer, respectively, to the low- and high-numbered ends of chromosomes in the genome sequence. 
b Cen and Tel indicate transcription of the 35S rRNA gene toward the centromere or telomere, respectively. 



chromosome 1, in the interval between genes Anc_1.349 
(CDC14) and Anc_1.372 (TLG2), which also contains a 
tRNA-Tyr gene. The CDC14 and TLG2 genes are neighbors, 
separated by less than 3 kb, in L. kluyveri and the outgroup 
species E. gossypii (fig. 1). Note that their names in the 
Ancestral gene numbering system are not consecutive 
simply because that system refers to the gene order that ex- 
isted at the point marked "WGD" in figure 1 , which has some 
rearrangements relative to the gene order that existed in the 
common ancestor of the Kluyveromyces/Eremothecium/ 
Lachancea clade (Gordon et al. 2009). 

The new rDNA site in Lachancea is internal to a chromo- 
some. In contrast, in other taxa, the rDNA can be inferred to 
have moved from the ancestral site to a subtelomeric location 
on at least four separate occasions: in the terminal branches 
leading to C. glabrata, Tetrapisispora phaffii, and T. blattae 
and in the common ancestor of T. delbrueckii and Zygo- 
saccharomyces rouxii. Alternatively, there may have been 
five events if the relocations in the latter two species occurred 
separately. The telomeric locations in these species all appear 
to be unrelated to one another, based on the Ancestral genes 
closest to them (table 1). However, most of them (6 of 7) 
correspond to telomeres in the Ancestral genome. Except 
for T. delbrueckii, all the species with telomeric rDNA also 
show rearrangements at the Ancestral site, but we cannot 
tell whether these rearrangements were somehow involved 
in moving the rDNA to a telomere. 

No Sequence Features at Sites of rDNA Loss or Gain 

We examined the DNA sequences of all the intergenic regions 
that correspond to sites from which rDNA has been deleted 
(marked X in fig. 1). These regions range from 170 bp 
(T, delbrueckii) to 1,803 bp (V. polyspora), which contrasts 
starkly with their previous length of more than a megabase. 
None of these intergenic regions contains a pseudogene of 
rDNA or other unusual sequence features. Similarly, there are 
no obvious features in the intergenic regions between the 
tRNA-Tyr gene and TLG2 in £ gossypii (121 bp) and L. kluyveri 



(2,81 7 bp), which are orthologous and colinear with the rDNA 
integration site in L. thermotolerans and L. waltii. 

Functions of Genes beside the rDNA Locus 

Rather surprisingly, one of the genes — HMO 7 — located 
beside the ancestral rDNA site in non-WGD species codes 
for a protein that is intimately involved in the correct function- 
ing of the rDNA array. The S. cerevisiae rDNA occupies the 
nucleolus and is composed of chromatin with an unorthodox 
structure (Birch and Zomerdijk 2008). Two different rDNA 
chromatin states exist, called "open" and "closed" (Wittner 
et al. 201 1), and rDNA arrays consist of a mixture of open and 
closed units. Open rDNA is actively transcribed by RNA poly- 
merase I. This DNA is largely devoid of histones and is instead 
associated with Hmo1, an HMG-domain DNA-binding protein 
that has no other known functions (Merz et al. 2008). Closed 
rDNA is not transcribed. It contains nucleosomes whose his- 
tones are deacetylated by Sir2 protein, which suppresses ille- 
gitimate recombination between different units within the 
array and so prevents collapse of the array (Kobayashi et al. 
2004). The open state is necessary for the production of ribo- 
somes, but the closed state is essential for genome replication 
and stability (Aragon 2010). 

At the new rDNA location in L. thermotolerans and L. waltii, 
one of the neighboring genes — CDC14 — also has a functional 
connection to the rDNA array. The balance between the two 
chromatin states in an array is a dynamic equilibrium: 
Nucleosomes are deposited after DNA replication, forming 
closed chromatin, but once transcription is activated, the nu- 
cleosomes are replaced by Hmo1 and open chromatin until 
the next cycle of replication (Wittner et al. 201 1). After DNA 
replication, the rDNA locus is the last point in the genome at 
which sister chromatids remain attached before they segre- 
gate (Sullivan et al. 2004). Their separation in anaphase is 
triggered by Cdc14, the mitotic exit phosphatase. Cdc14 is 
required for separation of the replicated rDNA, recruitment of 
condensin, and inhibition of RNA polymerase I transcription 
(Sullivan et al. 2004; Clermente-Blanco et al. 2009). 
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Discussion 

A chromosome conformation capture study of the 
three-dimensional organization of the genome in interphase 
5. cerevisiae nuclei showed that, although there are extensive 
intrachromosomal physical interactions between all parts of 
other chromosomes, the rDNA array almost completely blocks 
all physical interaction between the parts of chromosome XII 
to its left and right, dividing this chromosome into three phys- 
ical domains (Duan et al. 2010). Therefore, evolutionary 
transposition of the rDNA array from one site in the 
genome to another is predicted to dramatically reorganize 
both its old and new host chromosomes within the nucleus, 
with possible implications for gene regulation on both 
chromosomes. 

How can rDNA move within a genome? In short we do not 
know, but we can suggest two hypotheses. One hypothesis is 
that the mechanism by which the rDNA replicates can lead to 
movement, because numerous double-strand DNA breaks 
(DSBs) are formed in the array during every cycle of replication 
(Kobayashi et al. 2004). There is an origin of replication up- 
stream of the divergently transcribed 5S and 35S genes in 
every unit, but not all origins fire (fig. 2). A replication fork 
barrier (RFB) downstream of the 35S gene only allows replica- 
tion forks to pass in one direction, the same direction as the 
35S gene is transcribed (Brewer and Fangman 1988; Linskens 
and Huberman 1988). The RFB is the binding site for the pro- 
tein Fob1 (Kobayashi 2003). When an origin of replication 
fires, the replication fork traveling through the 5S gene will 
soon arrive at an RFB and a DSB will form. This break will not 
be repaired until a fork moving in the opposite direction, 
which has traveled much further — probably through several 
rDNA units — meets it (fig. 2). The formation of DSBs also pro- 
vides a mechanism for the rDNA array to expand or contract 
by unequal sister chromatid exchange, in which a different 
unit in the array is used as a template for repair (Kobayashi 
et al. 1 998; Ganley et al. 2009). We speculate that these DSBs 
in rDNA could sometimes interact with other sites of sponta- 
neous DSB in the genome, leading to genomic rearrangement 
and movement of part of the rDNA array to a new site. 

Frequent formation of DSBs in the rDNA array may also 
explain another apparent property of the locus: a propensity 
to take up extraneous genes or DNA. The map of genes flank- 
ing the rDNA in figure 1 only shows those genes whose loca- 
tion is conserved across multiple species. Many species-specific 
or clade-specific genes near the rDNA have been omitted for 
clarity. For example, in S. cerevisiae, there are four copies of 
the species-specific gene ASP3 between the rDNA and MAS1, 
and ASP3 appears to have been horizontally transferred into 
5. cerevisiae from Wickerhamomyces (League et al. 201 2). On 
the other side of the S. cerevisiae rDNA, between the 
tRNA-GIn gene and ACS2, is the gene RNH203, which was 
relocated to this site in Saccharomyces after being expelled 
from the MA T locus (Gordon et al. 201 1). In S. bayanus, there 
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Fig. 2. — Mode of replication of the Saccharomyces cerevisiae rDNA 
array (modified from Brewer and Fangman [1988] and Ganley et al. 
[2009]). An origin of replication (ORI) is located between the 35S and 5S 
genes in each unit, but many origins are inactive (gray). Replication forks 
moving rightward cannot pass through the RFB, but forks moving leftward 
can pass. A DSB is formed when a replication fork stalls at the RFB and is 
repaired when a fork moving in the other direction meets it. 



are fragments of a linear DNA plasmid between RNH203 and 
the rDNA (Frank and Wolfe 2009). 

Alternatively, a second hypothesis is that circular interme- 
diates may be involved in the mobility of rDNA. It has been 
shown experimentally that rDNA units can "pop out" of the 
5. cerevisiae rDNA array by intramolecular recombination 
between different units in the array, forming a 9.1 -kb circular 
DNA molecule or multimers of this structure (Sinclair and 
Guarente 1997; Poole et al. 2012). These extrachromosomal 
rDNA circles (ERCs) are capable of replication because each 
rDNA unit contains an origin of replication. Although there is 
no experimental evidence that ERCs can reintegrate back into 
the genome at new sites, there is evidence that other extra- 
chromosomal elements, including circular molecules such as 
plasmids, can become integrated into the genome at sites of 
DSBs (Ricchetti et al. 1999; Frank and Wolfe 2009; Borneman 
et al. 201 1; Galeote et al. 201 1). Thus, if a multimeric ERC 
containing at least two rDNA units became integrated at a 
DSB site somewhere in the genome, a second rDNA array 
could develop at that locus. 

These hypotheses suggest ways that a new rDNA array 
could begin to form at a second site in the genome, but 
they do not suggest a mechanism for how rDNA could be 
completely lost from the original site. We suggest that the 
presence of two rDNA arrays at different sites in a genome 
is deleterious, unless they are both telomeric (as seen in 
C. glabrata and T. biattae, and possibly V. polyspora). For 
one thing, the two arrays would be expected to recombine 
with each other, leading to chromosomal translocations 
(Belloch et al. 2009). There may also be other factors that 
make the presence of two rDNA arrays deleterious (Morales 
and Dujon 2012). The lager yeast 5. pastorianus is an 
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interspecies hybrid between S. cerevisiae and 5. eubayanus, 
which when formed would have had two versions of 
chromosome XII with rDNA arrays that were quite divergent 
in sequence (Libkind et al. 201 1). In the hybrid, the 5. eubaya- 
nus-derived rDNA has collapsed to just 18kb, whereas 
the S. cerevisiae array remains full sized (Nakao et al. 2009). 
A similar uniparental loss of rDNA in an interspecies 
yeast hybrid was also observed in Milierozyma sorbitophila, 
a member of the CTG clade (Leh Louis et al. 2012): One 
parental cluster is complete and repeated in tandem 73 
times, whereas the other parental rDNA is only represented 
by two short incomplete rDNA relics located in highly poly- 
morphic subtelomeric regions. In contrast, recently formed 
hybrid Zygosaccharomyces species appear to maintain both 
parental types of rDNA (Solieri et al. 2007; Gordon and 
Wolfe 2008). 

We also examined the location of rDNA arrays in the 
Candida clade of species using CGOB (Fitzpatrick et al. 
2010), but the results were inconclusive. We found four dif- 
ferent nontelomeric rDNA locations among these species, 
which shows that their rDNA is mobile, but we were unable 
to infer which location is ancestral to the clade. Also, because 
some of these sites were relatively close to one another (< 1 00 
genes apart), we were unable to infer whether some rDNA 
movements were due to long-distance transposition or to 
local rearrangement of a chromosomal region. We did not 
find any protein-coding genes with rDNA-related functions 
beside the rDNA genes of Candida species. 

It is difficult to assess the statistical significance of finding 
the nucleolar protein genes HA/707 and CDC14 beside rDNA 
arrays. In 5. cerevisiae, 178 proteins (3%) are annotated as 
being localized in the nucleolus (Christie et al. 2009). 
However, many of these proteins are involved in processing 
rRNA precursor transcripts. It is interesting that both HMOl 
and CDC14 have functions that are connected to the chroma- 
tin structure of the rDNA array, not to rRNA processing. Both 
HMO 7 and CDC14 are transcribed in the direction away from 
the rDNA, so it is possible that their promoters are sensitive to 
rDNA chromatin structure in the species where they are lo- 
cated beside rDNA. We could therefore speculate that the 
location of the rDNA within the genome may be constrained 
by natural selection associated with the correct regulation of 
the neighboring protein-coding genes. However, we should 
also note that the linkage between HMOl and the rDNA has 
been broken several times, including by WGD (fig. 1). There 
also does not appear to be any functional connection between 
rDNA and the flanking genes on the other side, ARG82 
(inositol polyphosphate kinase) at the ancestral location and 
TLG2 (a SNARE protein involved in membrane fusion) at the 
new site in Lachancea. Experimental studies on the 
regulation of HMOl and CDC14 in non-WGD species will 
be needed to assess the significance of their colocation with 
the rDNA array. 
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